Narrowband to wideband feature expansion for robust multilingual ASR

نویسنده

  • Dusan Macho
چکیده

To build high quality wideband acoustic models for automatic speech recognition (ASR), a large amount of wideband speech training data is required. However, for a particular language, one may have available a lot of narrowband data, but only a limited amount of wideband data. This paper deals with such situation and proposes a narrowband to wideband expansion algorithm that expands the narrowband signal ASR features to wideband ASR features. The algorithm is tested in two practical situations comprising sufficient amount and insufficient amount of original wideband training data. Tests show that using a combination of wideband features and expanded features does not harm the ASR performance when having a sufficient amount of the original wideband data, and it improves the ASR performance significantly when only a limited amount of wideband data is originally available. In the presented multilingual tests, a unique expansion model is trained for four languages from the Speecon database. Availability of different amounts of wideband training data is considered, including the case when no wideband data is available. ASR experiments for each language confirm that the addition of expanded features to the wideband model training enhances the models and provides better results than using the limited amount of wideband data only. In all tests, the ETSI standard noise-robust front-end is used.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

DNN-based speech bandwidth expansion and its application to adding high-frequency missing features for automatic speech recognition of narrowband speech

We propose a number of enhancement techniques to improve speech quality in bandwidth expansion (BWE) from narrowband to wideband speech, addressing three issues, which could be critical in real-world applications, namely: (1) discontinuity between narrowband spectrum and the estimated high frequency spectrum, (2) energy mismatch between testing and training utterances, and (3) expanding bandwid...

متن کامل

Windowing Effects of Short Time Fourier Transform on Wideband Array Signal Processing Using Maximum Likelihood Estimation

During the last two decades, Maximum Likelihood estimation (ML) has been used to determine Direction Of Arrival (DOA) and signals propagated by the sources, using narrowband array signals. The algorithm fails in the case of wideband signals. As an attempt by the present study to overcome the problem, the array outputs are transformed into narrowband frequency bins, using short time Fourier tran...

متن کامل

Windowing Effects of Short Time Fourier Transform on Wideband Array Signal Processing Using Maximum Likelihood Estimation

During the last two decades, Maximum Likelihood estimation (ML) has been used to determine Direction Of Arrival (DOA) and signals propagated by the sources, using narrowband array signals. The algorithm fails in the case of wideband signals. As an attempt by the present study to overcome the problem, the array outputs are transformed into narrowband frequency bins, using short time Fourier tran...

متن کامل

Robust Bandwidth Extension of Noise-co

We present a new bandwidth extension algorithm for converting narrowband telephone speech into wideband speech using a transformation in the mel cepstral domain. Unlike previous approaches, the proposed method is designed specifically for bandwidth extension of narrowband speech that has been corrupted by environmental noise. We show that by exploiting previous research in mel cepstrum feature ...

متن کامل

WTIMIT: The TIMIT Speech Corpus Transmitted Over The 3G AMR Wideband Mobile Network

In anticipation of upcoming mobile telephony services with higher speech quality, a wideband (50 Hz to 7 kHz) mobile telephony derivative of TIMIT has been recorded called WTIMIT. It opens up various scientific investigations; e.g., on speech quality and intelligibility, as well as on wideband upgrades of network-side interactive voice response (IVR) systems with retrained or bandwidth-extended...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007